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Abstract 

We analyze the role that popularity and novelty play in attracting 
the attention of users to dynamic websites. We do so by determining 
the performance of three different strategies that can be utilized to 
maximize attention. The first one prioritizes novelty while the sec- 
ond emphasizes popularity. A third strategy looks myopically into 
the future and prioritizes stories that are expected to generate the 
most clicks within the next few minutes. We show that the first two 
strategies should be selected on the basis of the rate of novelty decay, 
while the third strategy performs sub-optimally in most cases. We also 
demonstrate that the relative performance of the first two strategies 
as a function of the rate of novelty decay changes abruptly around a 
critical value, resembling a phase transition in the physical world. 
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1 Introduction 



As millions of people use the web for their social, informational, and 
consumer needs, content providers vie for their limited attention by 
resorting to a number of strategies aimed at maximizing the number 
of clicks devoted to their web sites [1]. These strategies range from 
data personalization and short videos to the dynamic rearrangement 
of items in a given page, to name a few [11 [8] . In all these cases the 
ultimate goal is the same: to draw the attention of the visitor to a 
website before she proceeds to the next one [4j. Obviously, the more 
interesting and relevant the site the more valuable it will be to users. 
In addition, since users need to decide among the existing plethora of 
links and sites, their popularities are a determinant of their success, 
for people often click on given links for no other reason than the fact 
that many others do. If we add the fact that without novelty attention 
tends to decay in time, one has a first order list of the requirements 
for capturing people's attention. 

Within this context, we have recently shown that there is a strong 
interplay between novelty and collective attention, which is universally 
manifested in a rather swift initial growth of the number of people 
looking at a new item within a site and its eventual slowdown as 
interest fades among the population [7]. This result suggests that 
ordering the links of a given page by their novelty can guarantee a high 
degree of attention. This is indeed the case in many news websites, 
notably digg . com. 

And yet, given the role that popularity plays in attracting the 
attention of users, a natural question arises as to whether alternative 
orderings, like one giving priority to popularity over novelty, might 
not do better at attracting viewers to a site. 

This paper answers this question by taking the dynamics of col- 
lective attention to a finer level of detail and examining the role that 
popularity and novelty play in determining the number of clicks within 
a given page. In particular, we study three different strategies that 
can be deployed in order to maximize attention. The first strategy 
prioritizes novelty while the second emphasizes popularity. The third 
strategy looks myopically into the future and prioritizes stories that 
are expected to generate the most clicks in the next few minutes. We 
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show that the first two strategies should be selected on the basis of the 
rate of novelty decay, while the third strategy performs sub-optimally 
in most cases. Most interestingly, we discover that the relative perfor- 
mance of the first two benchmark strategies as a function of the rate 
of novelty decay switches so sharply around some critical value that 
it resembles phase transitions observed in the real world. 

The work is organized as follows. We first study the question 
of whether or not the location of a link in a page determines the 
overall number of clicks in a given time interval. Having answered 
this in the affirmative through an empirical study of digg . com, we 
then proceed to introduce a set of indexes whose values determine 
the optimal strategy to be pursued in order to maximize attention to 
a page. Using measured values of the rate of decay from digg.com 
we built a realistic simulator to collect statistically significant data to 
measure each of the indices introduced. 

We then study the performance of each of these indices as a func- 
tion of the decay rate and show which strategy optimizes viewing for 
given values of the decay. Most importantly we compute a full phase 
diagram that indicates at a glance the optimal strategy to use given 
the parameter values of the site. This phase diagram exhibits a sharp 
boundary between the choice of prioritizing novelty over popularity, 
thus resembling a phase transition. 

Finally we summarize our results and discuss their implications for 
the design of dynamic websites. 



In this section we study how the order in which links are placed within 
a webpage (e.g. the news stories of digg. com) determines the number 
of clicks within a certain time frame. Assume that time flows discretely 
as t = 0, 1, 2 . . . minutes. Let Nf denote the number of clicks, or digg 
number of a story in digg . com, that appeared on the website t minutes 
ago (in this case we say that the story has lifetime t). As we showed 
earlier [7] the growth of Nt satisfies the following stochastic equation: 



2 Location matters 




(1) 
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where rt is a novelty factor that decays with time and satisfies tq = 1, 
Xt is a random variable with mean 1, and a is a positive constant. 

This equation takes into account two important factors that to- 
gether determine the growth of cohective attention: popularity and 
novelty. The popularity effect is captured by the multiplicative form 
of Eq. ([1]), and the novelty effect is described by vt- All other factors 
are contained in the noise term Xt. 

We next take the analysis to a finer level by considering a third 
position factor. A news story displayed at a top position on the front 
page easily draws more attention than a similar story placed on later 
pages. Hence the growth decay art should depend on the physical 
position at which the story is posted. 

In the specific case of digg.com, its front page is divided into 15 
slots, being able to display 15 stories at a time. The stories are always 
sorted chronologically, with the latest story at the top. If we label 
the positions from top to bottom by i = 1,2, . . . , 15, we can modify 
Eq. ([1]) to allow for an explicit dependency of a on i: 

Nt+i = Nt{l + airtXt), (2) 

where is a position factor that decreases with i. 

The assumption that the novelty effect and the position effect can 
be separated into two factors rt and Oi needs to be tested empirically. 
To this end we tracked the growth rate for each slot, rather than for 
each story. For multiplicative models it is convenient to define the 
logarithmic growth rate 

St = log Nt+i- log Nt. (3) 

When a is small (which is always true for short time periods) we have 
from Eq. © 

^ a^rtXt (4) 

for a story placed at position i at time t. Taking expectation of both 
sides, we have 

Esl ^ Oirt, (5) 

since EXt = 1. 

The logarithmic growth rate si can be measured as follows. For 
each fixed position i, if a digg story appears on that position at both 
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(a) (b) 

Figure 1 - The logarithmic growth rate for the top two positions on 
the front page of digg . com. Time is measured in minutes. Data is 
collected every 5 minutes, the rate at which the front page is refreshed. 
The solid curve in (a) is the result of a minimum mean square fit 
to the data (see text for more details). It has the functional form 
/(t) = 0.120 . The curve in (b) has the functional form f{t) = 

0.106 e-0-4t°\ 



times t and t + 5 (the front page is refreshed every 5 minutes), then the 
observed quantity ^{log Nt+5 — log Nt) counts as one sample point of 
si- Fig. [T]^a) plots 1,220 sample points collected from the top position 
at various times. Fig. Wijo) is a similar plot for the second top position. 
By comparing (a) and (b) we see that indeed tends to fall below 
sj, which indicates that the position effect is real. To better illustrate 
the position effect, we plot the expected growth rate for position 1, 3 
and 5 in Fig. [2l As can be seen there, the growth rate decays as the 
story moves to lower positions. 

From this data we can also determine the values of Oj quantita- 
tively. We already established that for digg.com the precise functional 
form of the decay factor is = e~'''^*°^. Thus, for these particular 
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Figure 2 - The expected logarithmic growth rate for position 1, 3 
and 5 on the front page of digg.com. Time is measured in minutes. 
As can be seen, the growth rate decays as the story moves to lower 
positions. 

values, the minimum mean square estimator a* minimizes 

mmY^isl^U) - aVi/ = minJ^i^O') - a^e-"-^*^]^ (6) 

3 3 

where tj is the lifetime of the j'th data point. The estimator for the 
1,220 data points obtained from the top position is calculated to be 
a} = 0.120. The fitted curve a}rt = 0.120e~'^'^*J is shown as a solid 
curve in Fig.llja). An estimator d? = 0.106 for the second top position 
is also calculated and plotted in Fig. [IJb). As can be seen from those 
figures, the position effect (a') and the novelty effect (rj) can indeed 
be separated. We can then conclude that Eq. ([2]) fits the data very 
well. 
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3 Optimal ordering for maximal at- 
tention 



We now consider the order in which news stories should be displayed 
on a web page so as to generate the largest number of clicks within 
a certain time period T. This time period needs to be finite because 
the total number of clicks diverges as T goes to infinity. Equivalently, 
in an infinite-horizon framework, we could discount future clicks with 
a discount parameter 5, so that one click at time t counts as 5^ click 
at time 0. The objective then is to maximize Ylu=o^^-^ti where Nt is 
the total number of clicks generated from the news page in period t. 
In what follows we will consider the finite-horizon objective. 

To simplify the problem we confine ourselves to a subset of ordering 
strategies called indexing strategies, which is defined as follows. Given 
a story's state, which in our model is just a two- vector {Nt-, t), one first 
calculates an index O for each story using a predefined index function 
0{Nt,t), and then sorts the stories based on their indices. The story 
with the largest index is displayed at the top, the story with the second 
largest index next, and so on 

Rather than considering a general index function we will concen- 
trate on three simple strategies. While neither of them is perfect, each 
can increase overall attention to the site. 

1. Oi{t) = —t. The stories are sorted by their novelty, with the 
newest story at the top. This is what digg.com is doing today. 

2. 02{t) = Nf. The stories are sorted by their popularity, with the 
most popular story at the top. This strategy is based on the fact 
that attention grows in a multiplicative fashion (popular stories 
are more likely to become even more popular). 

3. 03(t) = Ntrt. This is the "one-step-greedy" strategy. Ignoring 
the position effect (assume o = 1), a story in state {Nt,t) gen- 
erates on average NtVt more clicks (or "diggs" if one considers 
digg . com) in the next period. This strategy thus places the most 
"replicated" story at the top. 

Notice that because grows with time, the effect of sorting by Oi is 
almost the opposite of sorting according to O2 ■ 
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In order to test these strategies, we built a simulator that closely 
resembles the functioning of digg . com in that it incorporates the fol- 
lowing rules: 

1. Initially there are 15 stories, all in state {Nt, t) = (1, 0). In words, 
each story starts with 1 digg and lifetime 0. (Because our model 
is purely multiplicative, the initial digg number does not matter. 
We just set it to be 1.) 

2. Allocate the 15 stories to 15 positions, in decreasing order of 
their 0{Nt,t), for any given index function O. 

3. Time evolves one step (5 minutes) at a time. The number of 
diggs generated from a story at position i is given by 

ANt+5 = Nt+5 -Nt = bainXtNt. (7) 

The total number of diggs generated in this time step is the sum 
of 15 such numbers. 

The values of were estimated from real data and shown in 
Fig. El = e"^'^*""*. Xt is randomly drawn from a normal 
distribution with mean 1 and standard deviation 0.5 (obtained 
from the real data from digg. com). 

4. On average every 20 minutes a new story arrives. Thus the 
number of stories arriving in one time step (5 minutes) follows a 
Poisson distribution with mean 0.25. When a new story enters 
the pool, the story with the lowest index is dropped, maintaining 
15 stories in total. (It is possible the a new story is dropped 
immediately after its arrival if it happens to have the lowest 
index.) 

5. Go back to Step [2] until the loop has been repeated for enough 
rounds. 

The performance of all three index functions were tested in our sim- 
ulator. For each index function. Steps 2 to 5 were repeated 100,000 
times (or equivalently 500,000 minutes). Strategy Oi (sort by nov- 
elty) achieved a total number of 514,314.8 diggs. Strategy O2 (sort by 
popularity) only generated 354.6 diggs. Strategy O3 (one-step-greedy) 
generated 452,402.3 diggs. Thus for these parameter values Oi turns 
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Figure 3 - The position factor decays as the position lowers. The 
values of are measured by tracking the 15 slots on digg. corn's front 
page. 

out to be best strategy, since it is 13.7% better than O3 and tremen- 
dously better than O2 ■ This confirms that digg . com is using the right 
strategy. 

The reason for the poor performance of the index O2 is easy to 
understand. O2 gives higher priority to stories that have been dugg 
many times. According to the indexing rule, after one period new 
stories can never find their way to the front page since all the old 
stories have more than 1 digg! When novelty decays fast, the old 
stories remaining on the front page soon lose their freshness and cease 
to generate any new diggs. The system thus gets frozen in an unfruitful 
state. 

The fact that Oi outperforms O3 is a bit harder to understand. 
Some intuition can be gained by considering an extreme case. Suppose 
each story completely loses its novelty after one second (ro = 1, = 
for all t > 0). Then only "new arrivals" should be displayed since they 
are the only ones that can generate new diggs. Sorting stories by their 
lifetime is a good idea when novelty decays fast. On the other hand, if 
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novelty never decays (r^ = 1), the lifetime factor becomes irrelevant. 
Thus in this case, strategy O3, which prioritizes popular stories, will 
win over Oi. Hence, the fact that Oi works better than O3 in our 
simulations shows that novelty decays relatively fast for digg.com. 
Should it decay at a slower rate, O3 would be a better choice. 

We point out that our simulation only showed that the ordering 
implied by Oi works better than O3 for a particular choice of T. In 
general this may not be true for other values of T. In fact, for a time 
interval of T = 5 minutes (one time step) O3 is by definition the best 
strategy. Hence, comparing the performance of two or more index 
functions only makes sense after one has specified a time horizon (or 
how much the future should be discounted if an infinite horizon is 
assumed) . 

In order to quantitatively test the limiting behavior of the three 
strategies, we repeated our simulations for a range of different values 
of the decay parameter r^. Our previous work suggested that rt decays 
as a stretched exponential function, whose general form can be written 
as rt = e"'^*''. For digg. com it turns out a = P = 0.4. The parameter 
/3 determines the decay rate. For fixed a, the larger /3, the faster rj 
decays. We repeated our experiment for a = 0.4 and (3 G [0.30,0.45]. 
The result is shown in Fig. HI The performance of each indexing 
strategy is measured by the logarithm of the total number of diggs 
generated in 10,000 rounds. We see that as /? increases (faster decay), 
the number of diggs decreases for all three indexing strategies. When 
P > 0.34, Oi performs slightly better than O3 and much better than 
02- When (3 < 0.33, however, O3 and O2 perform significantly better 
than Oi. In other words, on the two sides of the value of /? = 0.335, the 
stories should be displayed in completely reversed order! We therefore 
say that a phase transition takes place at the value of /? = 0.335. 

Other points worth mentioning are that in Fig.|4]03 asymptotically 
approaches Oi and O2 both in the fast and slow decay limits, and that 
in general O3 is the best index among the three strategies (although for 
the specific parameters of digg. com {a = f3 = 0.4) and our particular 
time horizon Oi is slightly better). This is because O3 trades off 
between popularity and novelty instead of betting on only one factor. 
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To see this, consider the equivalent index function 



0's{Nt,t) = log 03{Nt,t) = log Nt + log n. 



(8) 



Clearly, O3 linearly trades off between logA'^t and logr^, assigning 
identical weight to the two effects. This is by no means the best 
tradeoff. For example, the index function 



achieves 556,444.1 diggs after 100,000 rounds of simulation, which is 
8.2% more than Oi and 23.0% more than O3! However arbitrary it 
may seem to give the term log Nt weight 0.6 rather than 1 is beyond the 
scope of this paper, but it does show the complexity of our problem. 
These experiments demonstrate that the novelty decay rate needs to 
be measured with great care, as a slight change in the decay rate may 
totally reverse the optimal order needed to maximize attention. 

It is usually hard to analytically compute the performance of a 
general index function. For the two simple strategies Oi and O2, how- 
ever, some rough estimate can be achieved. For the sake of generality, 
assume that there are m positions on the front page. New stories ar- 
rive at a rate A > 0. Novelty decays as rt = e""*", where < /3 < 1. 
Let a = o-i the average position factor, which equals 0.08 for 
digg.com. Let At be the refresh time step, which is 5 minutes for 
digg. com. 

Consider strategy O2 first. According to the index rule, new stories 
never appear on the front page. All diggs are generated by the initial 
m stories. After time T we have from Eq. ([3]) that 



Oi(Nt,t) = 0.6logNt + logrt 



(9) 



log Nt 



E 



aiTtXtAt. 



(10) 



t=0,At,...,T-At 



Hence on average each story's log-performance is 



E log Nt 



t=0,At,-,T-At 



E 




(11) 



When T is large, we have 




(12) 
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Figure 4 - The total number of diggs generated using three ordering 
strategies Oi, O2, and O3, for a = 0.4 and a range of /?. The novelty 
factor decays as rj = e""**^ . Performance is measured by the logarithm 
of the total number of diggs generated in 10,000 time steps. As can be 
seen, O3 asymptotically approaches Oi and O2 in the fast decay (large 
P) and slow decay (small /?) limit, respectively. A phase transition 
happens around P = 0.335. 
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Next consider Oi, which orders the stories by their Ufetime. On 
average every s = 1/A minutes a new story replaces an old story, and 
each old story moves down one position. Hence on average each story 
stays on the front page for ms minutes, where m is the number of 
positions. We call ms one page cycle. It is the average time it takes 
to refresh the whole page. We now see that, before a story disappears 
from the front page, it generates 

^^P E ai(^t)nXtAt (13) 

\t=0,At,---,ms-At J 

diggs, where i{t) is the story's position at time t. When an story gets 
replaced by a new story, they are counted as one story restarting from 
the state Nt = 1 and t = 0. The multiplicative process starts over, and 
another N^g diggs are generated in the next ms minutes, on average. 
Thus, in a total time period T the process is repeated T/{ms) times, 
and a total number of NmsT/(ms) diggs are generated per story. The 
log-performance of Oi is approximately 

log Nms + log (—) = artXtAt + log ( —) , (14) 

where we replaced ai{t) by a since on average each story stays in 
position 1, . . . ,m for equal times. Taking expectation on both sides, 
we have 

E log N„,s + log (—) ~ a n rtdt + log ( —) . (15) 
\msj Jo \msj 

The critical point can be determined by equating Eq. (1121) and 



E log Nt -ElogNms = logT - log(ms), (16) 



or ^ 

poo / 'J' 



which holds for any functional form of rt . The left side of Eq. (jl6p can 
be interpreted as the total novelty left after a time ms, or the total log- 
performance that can be gained from one story after one page cycle. 
The right hand side of Eq. (jl6p is the total log-time left after one page 
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cycle. Thus, Eq. (fT6]) and (fT7j) say that, after one page cycle, if there 
is more novelty left than the log-time remained, the stories should be 
ordered by decreasing popularity rather than by decreasing novelty 
(O2 is better than Oi). Conversely, if novelty decays too fast (not 
enough novelty left after one page cycle), then the stories should be 
ordered by decreasing novelty rather than decreasing popularity (Oi 
is better than 02)- 

When rt = e""*'' it holds that 

where 

V{a,x) = / e-^e-^dt (19) 

J X 

is the incomplete Gamma function. In this case the critical equation 
can also be written as 
_\_ 

aeylr(i aM^) = i„g(£). (20) 

For the parameters of digg.com (a = 0.08, m = 15, s = 20) and 
horizon T = 50, 000 one can solve for the critical curve (a, 13) on 
which Oi and O2 have the same performance. The curve is shown 
in Fig. [5] as a phase diagram. When the parameters (a, /3) lie above 
the critical curve, the stories should be sorted by Oi. Otherwise they 
should be sorted by O2. 

To illustrate how sharp the phase transition is, we plot the relative 
performance 02/(0i + O2) as a function of /3, for fixed a = .4, in 
Fig. [6j As can be seen, the transition is indeed very sharp. 

4 Conclusion 

In this paper we have shown that depending on the rate of decay of 
novelty, two different strategies can be deployed in order to maximize 
attention. The first one prioritizes novelty while the second empha- 
sizes popularity. Most interestingly, the shift from one to the other 
as a function of the rate of decay is extremely sharp, resembling the 
phase transitions observed in the physical world. 



14 




Figure 5 - The phase diagram. The critical curve is calculated by 
solving Eq. ([lOj) with a = 0.08, m = 15, s = 20 and T = 50, 000. 
When (q, (3) lies in the upper half of the phase diagram Oi works 
better than O2. Otherwise O2 works better. 
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Figure 6 - The relative performance Oil{0\ + O2) as a function of 
/3, for fixed a = .4. 
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These results were obtained by focusing on the dynamics of col- 
lective attention and examining the role that popularity and novelty 
play in determining the number of clicks within a given page. In par- 
ticular, we analyzed three different strategies that can be deployed 
in order to maximize attention. The first strategy prioritizes novelty 
while the second emphasizes popularity. The third strategy looks my- 
opically into the future and prioritizes stories that are expected to 
generate the most clicks in the next few minutes. We then showed 
that the first two strategies should be selected on the basis of the rate 
of novelty decay, while the third strategy performs sub-optimally in 
most cases. Most interestingly, we discovered that the relative perfor- 
mance of the first two benchmark strategies as a function of the rate 
of novelty decay switches so sharply around some critical value that 
it resembles phase transitions observed in the real world. 

Given the importance of maximizing page views for most content 
providers, this work suggests a principled way of choosing what to 
prioritize when designing dynamic websites. Knowledge of the rates 
with which novelty and popularity evolve within the website can then 
be translated into decisions as to what to show first, second, etc. 
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